A Provably Correct Stackless Intermediate Representation for Java Bytecode
نویسندگان
چکیده
The Java virtual machine executes stack-based bytecode. The intensive use of an operand stack has been identified as a major obstacle for static analysis and it is now common for static analysis tools to manipulate a stackless intermediate representation (IR) of bytecode programs. Several algorithms have been proposed to achieve such a transformation, but only little attention has been paid to their formal semantic properties. This work provides such a bytecode transformation, describes its semantic correctness and evaluates its performance with respect to the transformation time, the compactness of the obtained code and the impact on static analysis precision. We provide the semantic foundations for proving that an initial program and its IR behave similarly, in particular with respect to object creation and throwing of exceptions. We formalize a notion of semantic preservation: an initial program and its IR have similar execution traces. Since the transformation does not preserve the object allocation order, the similarity between traces is defined using a relation between the two heaps. The correctness of this transformation is proved with respect to this semantic criterion. Key-words: Static analysis, Bytecode languages, Program transformation Work partially supported by EU project MOBIUS ∗ Université de Rennes 1, Rennes, France † CNRS, Rennes, France ‡ INRIA, Centre Rennes Bretagne Atlantique, Rennes Une Représentation Intermédiaire Basée Registre Prouvée Correcte pour le Bytecode Java Résumé : La machine virtuelle Java exécute des programmes bytecodes en utilisant une pile d’opérandes. Cet usage intensif d’une pile d’opérandes a été identifié comme un obstacle majeur pour l’analyse statique. Il est désormais courant que les outils d’analyse statique modernes aient recours à une transformation préliminaire qui retire cet usage. Plusieurs algorithmes ont été proposés pour réaliser cette transformation du bytecode vers une représentation intermédiaire (IR), mais très peu d’attention a été portée jusque là à leurs propriétés sémantiques. Ce travail spécifie une telle transformation et propose les fondations sémantiques pour prouver qu’un programme bytecode initial et sa représentation intermédiaire se comportent de façons similaires, en particulier vis à vis de l’initialisation des objets et des lancements d’exceptions. La transformation est basée sur une execution symbolique du bytecode utilisant une pile d’opérandes abstraits. Chaque instruction bytecode modifie la pile symbolique, et donne lieu à la génération d’instructions du langage d’IR. Nous formalisons une notion de préservation sémantique : un programme et son IR ont des traces d’exécution similaires. La transformation ne conservant pas l’ordre d’allocation des objets, cette similarité de trace est exprimée par une relation de correspondance sur les tas. Enfin, la correction sémantique de la transformation est prouvée relativement à ce critère. Mots-clés : Analyse statique, Langage Bytecode, Transformation de programmes A Provably Correct Stackless Intermediate Representation for Java Bytecode 3
منابع مشابه
Provably Correct Control-Flow Graphs from Java Programs with Exceptions
We present an algorithm to extract flow graphs from Java bytecode, including exceptional control flows. We prove its correctness, meaning that the behavior of the extracted control-flow graph is a sound over-approximation of the behavior of the original program. Thus any safety property that holds for the extracted control-flow graph also holds for the original program. This makes control-flow ...
متن کاملToward a Provably Correct Implementation of the JVM Bytecode Veri er
This paper reports on our ongoing e orts to realize a provably correct implementation of the Java Virtual Machine bytecode veri er We take the perspective that bytecode veri cation is a data ow analysis problem or more generally a constraint solving prob lem on lattices We employ Specware a system available from Kestrel Institute that supports the development of programs from speci cations to f...
متن کاملToward a Provably - Correct Implementation of the JVMBytecode Veri
This paper reports on our ongoing eeorts to realize a provably-correct implementation of the Java Virtual Machine bytecode veriier. We take the perspective that bytecode veriication is a dataaow analysis problem, or more generally, a constraint solving problem on lattices. We employ Specware, a system available from Kestrel Institute that supports the development of programs from speciications,...
متن کاملToward a Provably-correct Implementation of the Jvm Bytecode Veriier
This paper reports on our ongoing eeorts to realize a provably-correct implementation of the Java Virtual Machine bytecode veriier. We take the perspective that bytecode veriication is a dataaow analysis problem, or more generally, a constraint solving problem on lattices. We employ Specware, a system available from Kestrel Institute that supports the development of programs from speciications,...
متن کاملToward a Provably-Correct Implementation of the JVM Bytecode Verifier
This paper reports on our ongoing efforts to realize a provably-correct implementation of the Java Virtual Machine bytecode verifier. We take the perspective that bytecode verification is a data flow analysis problem, or more generally, a constraint-solving problem on lattices. We employ SPECWARE, a system available from Kestrel Institute that supports the development of programs from specifica...
متن کامل